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For the past several years, researchers at NASA Langley have been engaged in a series 
of projects to study the degree to which existing facilities and capabilities, originally cre- 
ated for work on full-scale aircraft, are extensible to smaller scales —— those of the small 
unmanned aerial systems (SUAS, also UAVs and, colloquially, ‘drones’) that have been 
showing up in the nation’s airspace of late. This paper follows an effort that has led to 
an initial human—subject psychoacoustic test regarding the annoyance generated by sUAS 
noise. This effort spans three phases: 1. The collection of the sounds through field record- 
ings. 2. The formulation and execution of a psychoacoustic test using those recordings. 3. 
The initial analysis of the data from that test. The data suggests a lack of parity between 
the noise of the recorded sUAS and that of a set of road vehicles that were also recorded 
and included in the test, as measured by a set of contemporary noise metrics. Future 
work, including the possibility of further human subject testing, is discussed in light of this 
suggestion. 


I. Introduction 


HE ongoing proliferation of small unmanned aerial syatems (SUAS, read “small U.A.S.”) has captured the 
ies of many, from single hobbyists to entrepreneurs working for some of the largest companies 
on the planet. In the United States, a large number of applications for sUAS are now open for commercial 
exploration given the FAA’s changes in policy over the past several years.' It may be that, in a short while, 
communities across the US will be inundated with new classes of noise due to sUAS operations that they 
had not before encountered. 

The Design Environment for Novel Vertical Lift Vehicles (DELIVER) project at NASA has been working 
to determine the feasibility of producing a conceptual design tool for sUAS that encapsulates NASA’s 
capabilities in the field of full-scale rotorcraft and fixed-wing aircraft design. This tool would bring together 
estimations of the performance of a proposed vehicle (speed, range, etc.) as well as the environmental impact 
(i.e., noise and annoyance; though also efficiency, emissions, etc.). 

In order to incorporate annoyance into such a tool, some relationship between the predicted physical 
sounds generated by sUAS must be related to the annoyance that the sound would be expected to generate 
in human listeners. To date, there have not been any objective studies published to gain even a coarse view 
of annoyance due to sUAS noise specifically. Further, it is clear that the noise of these machines does not 
resemble, qualitatively, the noise of contemporary aircraft. This difference in sound quality introduces an 
unknown factor into the prediction of the resultant annoyance. This paper describes a line of research which 
seeks to remedy that shortcoming. 


Research Premise 


In order to formulate a psychoacoustic test, it is necessary to first define a plausible research question 
that such a test might answer. A common early expectation was that multi-rotor sUAS would be used to 
deliver packages to residential communities (see, e.g., Clarkson’s discussion of the topic from late 20157). A 
reasonable expectation might be that noise from such sUAS operations will be able to be directly compared 
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to the noise produced by the contemporary machines used to perform the same task in society. That is, the 
assumption can be tested that there is nothing about the sound of sUAS that implies that it need be treated 
in some special way vis a vis noise from, for instance, a delivery truck in a residential neighborhood. This 
premise, while quite broad, is easily tested through “single-event” psychoacoustic methods that have been 
in use for decades (see, e.g., Shepherd’). 

Generally, these methods involve collecting/creating various samples of noise to be tested. These sample 
sounds are then used to formulate a test to which a cross-section of the public will be invited as human 
subjects. Those subjects will be exposed to the noise samples one by one, and their response, in terms of 
annoyance, will be solicited to each noise individually. ‘The resultant data is then analyzed in various ways 
in order to answer targeted research questions. 


Organization 


This paper consists of three main parts corresponding to the steps that have been outlined: It first describes 
an effort to record the noise of various sUAS in operation. Second, it describes the initial psychoacoustic test 
on human subjects that employed these recordings, as well as other sounds. Last, it details the initial analysis 
of the resulting human subject data and offers two interpretations that will help guide future research in 
this vein. 


II. Sound Collection 


This section describes the collection of sounds for inclusion in the psychoacoustic test. First, it describes 
the recording of a range of multi-copter sUAS executing flyover operations above an array of microphones 
on or near the ground. It then describes the recording of a number of road vehicles that are meant to be 
representative of private and commercial operations in residential communities. Finally, the sources of several 
auralizations — completely synthetic sounds based on aeroacoustic predictions of sUAS/aircraft operations 
— that are included in the test are discussed. 


A. Recordings 


The bulk of the test was comprised of recorded sounds which were collected at three locations around the 
U.S. between September 2016 and January 2017. 


1. Oliver Farms IT 


The first set of recordings involved NASA-owned, commercially available sUAS. These were recorded during 
flights that took place on a small grass airstrip, referred to as Oliver Farms, bordered by sorghum fields near 
Smithfield, VA during late September, 2016. 

Photos of the three vehicles flown at Oliver Farms are shown in Fig. 1. They include, from left to right in 
the figure, the Drone America DaX8 octocopter, the Stingray 500 variable pitch quadcopter (VPV), and the 
DJI Phantom 2 fixed—pitch quadcopter. Characteristics, including vehicle weight, type, and control method 
during the flights, are listed in Table 1. 
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(b) Stingray 500 (VPV) (c) Phantom 2 


Figure 1. Photos of sUAS recorded at Oliver Farms. 
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Table 1. Attributes of sUAS recorded at Oliver Farms. 


Vehicle Name Type Weight | Control 
(kg) Method 

DaxX 84 fixed-pitch octocopter ~8 autopilot 
Stingray 500° (VPV) | variable-pitch quadcopter ~2 manual 
DJI Phantom 2° fixed-pitch quadcopter 1.6 autopilot 


Nominal flight trajectories for the vehicles consisted of straight—and-—level flyovers at 5 and 10 m/s forward 
flight speed at various altitudes above ground level (AGL). The flight paths were aligned with the long 
dimension of the runway. If an sUAS was incapable of 10 m/s, the vehicle’s highest practical speed was used. 
Actual speeds and altitudes obtained during the recordings varied depending on vehicle and pilot /autopilot 
capabilities. Vehicle state information, including roll, pitch, yaw, and GPS position, was recorded with a 
detachable 3DR Pixhawk flight data acquisition system (FDAS).’ This system recorded position at a 5 Hz 
rate with nominal GPS accuracy of 4 m.® 

The control method in Table 1 refers to whether the vehicle was remotely controlled by a human pilot 
(manual piloting) or by an on-board flight controller (autopilot). For manual piloting, the pilot stood on the 
side of the flight path and did his best to maintain a steady altitude and flight speed over a 120 to 300-meter 
long flight path centered on the runway. In this case the length of the flight path depended on the pilot’s 
comfort with the vehicle’s flying characteristics and visibility at the flight path extremes. For example, the 
Stingray 500 vehicle was difficult to handle and as a result the flyovers for that vehicle were shorter and had 
greater variability than the other vehicles. 

The DJI Phantom 2 was flown with three different bladesets in order to capture possible noise signature 
variations due to the blades. These blades include the standard OEM blades delivered with the vehicle, a 
carbon fiber set, and a “slow flyer” propeller manufactured by Advanced Precision Composites. Differences 
between these bladesets are described in more detail by Zawodny.? 


The methods for recording acoustic and flight data were similar to those used during previous NASA 
sUAS recording efforts.!? The sUAS flyover noise was recorded with three microphones. Two of the mi- 
crophones were placed directly beneath the flyover path of the sUAS; one on a tripod 1.2 m AGL and the 
second on a 0.4 m diameter rigid plastic ground board directly below the tripod microphone. The third 
‘sideline’ microphone, also on a rigid ground board, was displaced from the runway centerline 10 m, along 
the short dimension of the runway. GPS coordinates of the microphones were measured using a u-blox EVK 
7-P evaluation kit.‘! All recordings were made using GRAS 40AQ random incidence 1/2” prepolarized con- 
denser microphones coupled with GRAS 26CA constant current power preamplifiers. The preamplifiers were 
connected to a GRAS 12AX 4-channel power module that provided the constant current needed. For the 
Oliver Farms test, the microphone responses were digitized using a 5-channel National Instruments NI-4432 
USB module at a 20 kHz sample rate. The resulting sampled data were streamed to the hard drive of a 
connected laptop computer. 

The audio recording hardware simultaneously sampled an analog IRIG-B timecode signal to enable time 
synchronization between the GPS vehicle position data and the acoustic data. The analog timecode signal 
was demodulated and decoded to obtain a UTC-synchronized time signal for the sampled acoustic data. 
Seventeen seconds were subtracted from the GPS time in the vehicle state data to account for leap seconds 
in the GPS data as of late 2016.1? 

Wind conditions were generally calm for these flights, with winds less than 10 knots. A large number of 
cicadas and birds were present in the woods that bordered the field. This was an unfortunate noise source 
that necessitated attention during the test design phase, as discussed below. 


2. San Diego 


Additional sUAS recordings were taken later in 2016, in the Cleveland National Forest, about 35 miles NE 
of San Diego, CA. These flights were conducted with help from Straight Up Imaging, a San Diego firm 
which builds and operates sUAS for imaging/surveying purposes. These flights recorded the SUI flagship 
Endurance sUAS, shown in Figure 2 (referred to simply as “SUI” for the remainder of this document). This 
model weighs approximately 3.2 kg unloaded. The Endurance performed auto—piloted straight—and-—level 
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flyovers at 5 and 10 m/s speed, and at 20, 30, 50, and 100 m AGL. The SUI flyovers were much more tightly 
controlled than the flyovers at Oliver Farms. Winds were calm on the day of the test, and ambient noise 
was significantly lower than at Oliver Farms. 

The microphones and associated equipment were 
equivalent to that used at Oliver Farms, except that 
the microphone responses were recorded using a 
Tascam DR-701D 6-track field recorder in uncom- 
pressed PCM format.'? 


3. Langley Research Center 


Ground vehicle recordings were made at NASA Lan- 
gley on a quiet weekend in January. This set of 
recordings captured drive—bys of four road vehicles, 
all in good mechanical condition, shown in Fig- 
ure 3 and described in Table 2, on a long stretch 
of flat and straight road. The target test condition 
was a 10 m/s drive-by of the tripod—mounted mi- 
crophone, which was approximately 10 m from the 
centerline of the vehicle’s path. The drivers were in- 
structed to maintain a constant speed while passing Figure 2. Photo of the SUI Endurance quadcopter. 
the microphones to minimize engine noise associated 

with acceleration. The recordings were dominated 

by tire noise, as well as some low frequency engine 

noise for the larger vehicles. As it was winter in Virginia, there was considerably less background noise from 
local fauna than at Oliver Farms. The recording equipment was identical to that used in San Diego. Ground 
vehicle position information was recorded with the u-blox EVK 7-P evaluation kit.'! 


Table 2. Ground Vehicles 


Make Model Description 
Subaru Impreza Sport Passenger hatchback 
Ford Econoline 350 Utility van 


International Harvester | MaxxForce DT DuraStar | 20’ box truck (diesel) 
Grumman Kurbmaster /Utilimaster Step van 


B. Auralizations 


Although the results discussed in this paper concern the recorded vehicle sounds only, the sounds presented to 
the test subjects included additional computer—generated sounds, or auralizations. ‘These auralizations were 
included for comparison with previous human subject tests from which they were taken, and in anticipation 
of follow—on testing using auralizations of sUAS. 


(a) Subaru Impreza (b) Utility Van and Box Truck (c) Step Van 


Figure 3. Road vehicles included in Langley recordings. 
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1. Quadcopter 


Sounds were included from a group of auralizations that were generated based on a computer simulation 
of quadcopter flight through atmospheres of varying depths of realism.'* The dynamics of the simulated 
quadcopter were modeled so that the various physical forces acting on it could be added/removed easily 
in order to study the isolated/combined effect they had on the resultant acoustics. These effects include 
first— and second-order drag forces, the effect of atmospheric turbulence, and the effect that manufacturing 
tolerances at the component level might have on the operation of a quadcopter. 

These auralizations have been used in demonstrations at Langley, so that a large amount of anecdotal 
response data to the noise has been gathered already. It is therefore instructive to include these sounds 
in a formal study of psychoacoustic annoyance, though they are not expected to be directly comparable to 
their real/recorded counterparts. Additionally, it is likely that this auralization capability will find use in 
future psychoacoustic studies, as it gives the experimenter the ability to control the physical and acoustical 
properties of UAV flight that may not be possible in the real world. 


2. DEP Vehicle 


Another group of auralizations that were included in the test came from the Distributed Electric Propulsion 
psychoacoustic test that took place in 2015.'° These sounds represent a small civil aviation plane that 
employs a varying number (6, 12, or 18) of high lift/low noise electric propellers on the leading edge of 
the wings. These propellers are controlled in such a way that there may be small prescribed differences 
in rotational speed between adjacent propellers. In addition, the effect of atmospheric turbulence on the 
phase-stability of the sounds from the various propellers was modeled. The resulting phase and frequency 
differences between these sound sources can give the predicted noise of the DEP vehicle a variety of unique 
characteristics that are, perhaps, most appropriately described as ‘Jetsons—like.’ 

The DEP psychoacoustic test employed a similar testing modality to that used here — subjects were 
played a large number of single sounds and asked to rate their annoyance on a single continuous scale. 
Accordingly, these sounds were primarily included in order to study the parity between the responses from 
the DEP test and those found in this sUAS test. This analysis may be the subject of future efforts, and 
details of the responses to the DEP sounds are not included in the analysis presented here.!° 


III. Psychoacoustic Test 


The next section details the formulation and execution of the psychoacoustic test based on the collection 
of recordings/auralizations discussed above. There are several steps to this process: First, one or more 
specific research questions to be answered by the test must be formulated. Next, a subset of the recorded 
sounds designed to address these questions must be determined. ‘Then, the process by which individual 
sounds are distilled from their ‘raw’ recorded form into one ready to be presented to subjects is presented. 
Finally, details of the execution of the test and the facility used are given. 


A. Research Questions 


The research questions that will guide the selection of the test signals are three—fold: 


1. How do subjects interact with the task of rating annoyance in general? At a basic level, this question 
can be answered by observing, for instance, the amount of variance that exists between answers of 
the same sound when that sound is repeated at various times during the test. ‘This also includes 
observation of the variance that exists between subjects for the same sound. The technique of analysis 
of variance (ANOVA) is used to answer these questions. 


2. How do the operational factors impact annoyance? Examples of these factors are the vehicle type, 
speed, and altitude. Again, ANOVA would be used primarily to answer this question. 


3. Is there an observable difference between the annoyance produced by the set of sUAS used in this 
study and that produced by the road vehicles used? This question is most akin to the premise of this 
research as discussed in the introduction. It will be addressed here by the application of several forms 
of linear regression. 
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Given that subjects will be simply asked to rate their annoyance to a large set of sounds on some 
subjective scale, there may be many more questions that various types of further analysis may support, 
though no conjecture as to what those might be are offered here. 


B. Signal Selection 


Using these research questions as guidance, the set of sounds to be included in the psychoacoustic test was 
formed from the set of recordings. The final selection is shown in Table 3 on page 7. Every row in Table 3 
represents a single flyover sound. The ID numbers used in Table 3 come from those given to the recordings, 
so that they do not form a contiguous set in the test, nor are they listed in order in the table. Repeated 
sounds were given unique ID numbers, for instance numbers 1, 51, and 61 all represent the same sound to 
be presented to the subjects multiple times throughout the test. In this way, the first research question can 
be addressed by all of the rows in Table 3 that have multiple IDs. Further, rows 1-3 and 3-6 constitute sets 
of recordings made of the same nominal parameters (vehicle type, speed, etc.), in order to explore whether 
repeated observations of the same operations produce equal annoyance. 

All recorded sounds are taken from the tripod microphone except for IDs that are in the range of 100-199, 
which are taken from the sideline ground board microphone. The last two digits of these IDs correspond to 
their counterparts of the 0-99 range (e.g., ID 1 is the same flyover as ID 101). The inclusion of this set is 
meant to determine whether observations at slightly different locations (and without interference from the 
ground plane) produce significantly different results. This was only explored for the SUI vehicle. 

For the sUAS included in the test, the ‘Configuration’ column of Table 3 indicates the blade set used. 
For the SUI vehicle, OEM-—2 refers to the two-bladed configuration, and OEM-3 refers to the three-bladed 
configuration. For the Phantom 2, OEM refers to the standard bladeset, CF to the carbon fiber blades, 
and APC to the Advanced Precision Composites blades. The height and speed values listed are nominal, as 
indicated in the previous discussion regarding the differences between manual and autopilot control. 

All of the SUI samples were presented to the subjects at their original recorded level. For the other sUAS 
included in the test, the level was manipulated in order to produce a range of around 15 dBa max per—vehicle. 
As some sUAS were naturally quieter than others for the same flight condition, this produced a test that 
was primarily contained within a 20 dBa max range. This spread was intended to be wide enough to provide 
sufficient range for the subjects and for the linear regression, but not so wide as to generate contraction 
effects on the response scale (see, e.g., Bech!’). 

Gains were also applied to the road vehicle recordings. Given that they were all recorded at the same 
operating condition, there was very little observed variation between the recordings of the same vehicle. The 
span within and between road vehicle samples was made to be roughly equal to that of the sUAS samples 
in terms of dBa. For the road vehicles, the values in the height column indicate the nominal distance from 
the microphone to the centerline of the vehicle. 

Once a recording was chosen for inclusion, a start and stop point were determined to extricate the sample 
from the larger recording. These points were determined by observing the maximum dB, level reached by 
the event, and choosing points in time to either side of that maximum that corresponded to the same dB— 
down level. This level was set between 10 and 20 dB down in nearly all cases. For example, if the maximum 
was 65 dBa, then the sound may have been selected so that it started and stopped around 50 dBa. This 
level was often determined by the presence of extraneous sounds (e.g., birds chirping) that became clearly 
audible as the level of the flyover decreased. In some cases, particularly the high—altitude sUAS cases, points 
at less than 10 dB down were selected due to the extreme lengths of sounds that would have been produced 
by following the strategy. This process resulted in the lengths of the samples as noted in ‘Table 3. 

Four versions of the quadcopter auralization were included. As indicated in Table 3, each was presented to 
the subjects three times. The conditions 1-4 indicate various realistic effects included in the flight dynamics 
simulations that were run as input to the auralizations (again, see Christian’*). These correspond to, 
cumulatively: 1. No dynamical effects. 2. Drag effects on the body and rotors. 3. A model of turbulence 
acting on the sUAS. 4. Sources of random error included between the thrust coefficients of the four rotors. 

A set of nine various auralizations of the DEP vehicle were included. These were of simulated flyovers at 
300 m AGL, with a speed of 31 m/s. This gives a receiver—angle time history somewhat similar to that of 
the sUAS recordings. 
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Table 3. 


Index of sounds included in the psychoacoustic test. 


numbers are not necessarily in ascending order. 
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See text for full description. N.B., The ID 


C. Presentation Order 


Combined, there were 103 non—unique sounds in the test, resulting in about an hour of test time. Through 
experience with previous tests, it has been determined that a subject should be given breaks in their listening 
task intermittently — they should not be asked to listen to annoying sounds for an hour straight. ‘The test 
was therefore broken up into 4 sessions of about 15 minutes each. 

It is also desirable to present the test sounds in different orders for each group (of four) subjects. This 
is to minimize the effect of a possible sequential contraction bias, for instance, where a quiet sound would 
always be played after a loud one, leading to the former being judged as unreasonably not—annoying due to 
the consistent contrast between the two sounds. Similar biases may arise when a sound is played always at 
the beginning or end of the test. Therefore, each group of subjects should be presented with a systematically 
unique ordering of the sounds in order to alleviate possible biases.!’ 

To accomplish this, first, the 103 samples were partitioned into four blocks of relatively equal length. 
This was done in such a way that similar sounds were not assigned to the same block (e.g., no repeats ever 
occurred in an individual block). Then, a Latin Square ordering was used to assign one of the blocks of 
sounds to one of the sessions (see, e.g., Montgomery'®). As there were more groups of subjects than blocks, 
the Latin Square pattern was reversed for groups 5-8, and then repeated for groups 9 and 10. For each 
test session, the ordering of the approximately 26 sounds within a block was randomized uniquely for each 
session and each group.* 


D. Signal Processing 


Once the sounds were selected for inclusion in the test, a chain of signal processing techniques was imple- 
mented in MatLab to condition the recordings into a form in which they could be presented to the subjects. 


1. Upsampling 


All of the test samples were required to be at 44.1 kHz sampling frequency for playback in the Langley 
Exterior Effects Room (EER, discussed further below). While the recordings of the SUI Endurance, road 
vehicles, and the auralizations were already at this resolution, the recordings from Oliver Farms were at 20 
kHz having been recorded through the NI ADC. The process to convert the Oliver Farms’ recordings to the 
standard resolution involved 3 steps: 1. Upsample to 100 kHz using a polyphase filter. 2. Create a time-base 
at the new sampling frequency. 3. Use a cubic spline interpolation scheme to generate the samples at the 
desired new time base. 

In total, this process preserved the information in the Oliver Farms’ signals up to 9 kHz. This performance 
is satisfactory given that the original recordings only contained information up to 10 kHz, and given the 
high-frequency filtering step described next. 


2. Filtering 


While the recording system used at Oliver Farms included an NI ADC that was DC coupled, the other 
recordings were made with the TASCAM recorder, which did not necessarily have the dynamic range of the 
former. ‘To ensure that the recorder would not clip, for example, due to wind noise, the internal high—pass 
filter of the TASCAM was set to 50 Hz. To gain parity between the two recording platforms, as well as to 
filter out unwanted extraneous noise from the Oliver Farms’ recordings, a 2"¢ order Butterworth high-pass 
filter was designed and applied to those recordings. 

For all of the recorded sUAS, 50 Hz worked out to be less than half of the nominal rotor blade passage 
frequency, implying that the filtering process would have no significant impact on the components of the 
recording created by the sUAS. It is likely that the road vehicles produced content below this frequency, 
but it is unlikely that this content would have bearing on the perceived annoyance of the signal (as will be 
discussed further below). 

In addition to this processing, a 1°* order Butterworth low-pass filter was applied to the Oliver Farms 
recordings to remove high frequency white noise observed in some of those recordings. This filter was designed 
to begin to be effective at 3 kHz. While this frequency is still in the range of interest for human annoyance, 


*Although this does not preclude the chance of two sounds occurring in a row for two different subject groups, given that 
there are at least 25 sounds in a session, the chance of this occurring to the extent that it would create a bias is vanishingly 
small. Additionally, it is always good practice to have at least one random layer in a test design.!® 
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the filter was likely not aggressive enough to significantly impact the content of the recording until, perhaps, 
an octave above its nominal frequency. In this way, the filter smoothly separated the relevant components 
of the recordings from those that might extraneously impact the annoyance rating thereof. 


3. Windowing 


After the start and stop points were determined for the individual sample sounds, 2 seconds were added 
back to each end, during which a fade was applied. This was done to ensure that there were no audible 
artifacts at the ends of the playback, and so that that the sounds would not appear to ‘jump’ out from the 
background in a startling manner. 

A suitable fade-in function was found to be F: 


(2-1.54) 
F(t) = A for t € [0,2] (1) 


which multiplies the pressure-time signal (the time-reverse was used for the fade out). 


4. Final Considerations 


As indicated in the signal selection section, an overall gain was added to some of the signals in order to help 
the full set span a reasonable range. Once this, as well as the other signal processing tasks, had been applied, 
the sounds were written to 32-bit floating point wave files (a format that preserved the working units of Pa) 
for playback in the EER. 


E. Test Environment 


The test was conducted in the Exterior Effects Room (EER!”) at the NASA Langley Research Center in 
Hampton, Virginia. ‘The EER is a small, acoustically—treated auditorium with a 31-channel sound reproduc- 
tion system capable of simulating 3D spatialization of point-source sounds in real time. The reproduction 
capability of the EER extends over a compensated frequency range of 20 Hz to 20 kHz and from approxi- 
mately 23 to 94 dBa, at which point the playback system is limited to protect the subjects’ hearing. 

The EER is approved to test 4 subjects at a time as shown in Figure 4(a). The subjects sit in non- 
consecutive seats located close to the geometric center of the room. Subjects are visually isolated from one 
another by an acoustically transparent curtain. Between the subjects is a microphone belonging to a sound 
level meter (SLM) that is both used for calibration as well as to monitor the sound levels during the test. 


60 i a a a EES | a a i eS es ae a a ee S| oe ee aD 


—— Oliver Farms Ambient 
—— EER Ambient | 


Sound Pressure Level, dB re: 20 Pa 


-30 ee | po pk ae or ae gag pe el 
10° 10' 107 10° 104 10° 
Frequecy, Hz 


(a) (b) 


Figure 4. The test facility. (a) NASA employees posing as test subjects in the EER. (b) The background noise 
condition applied during the test, as compared to ambient noise recorded at Oliver Farms (f; = 44.1 kHz). 
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1. Ambient Noise Condition 


An artificial ambient noise condition was created in the EER based off of the ambient noise levels observed at 
Oliver farms (the loudest recording site). This ambient was created by filtering white noise into a 3—minute 
long loop that played through all speakers in the EER. The level of this loop was set to produce an overall 
sound pressure level of 36 dBa. The frequency content of this noise is shown in Figure 4(b), as compared 
to the content of a recording of the ambient noise at Oliver Farms (in which the ‘bump’ above 2 kHz is due 
primarily to cicadas). 


2. Calibration 


Calibration of the sounds for playback in the EER was accomplished by use of the EER SLM. The sample 
sounds were played back with the EER empty and the ambient noise off. The dBa max levels were compared 
to the maxima predicted by processing the sample sounds in the same manner as the SLM. The difference 
between this prediction and measurement was minimized. 

In this way, calibration factors were obtained separately for the set of flyovers, fly—bys (again, IDs 100- 
199), and drive-bys. These were separated due to the fact that the difference in the geometry of their 
playback in the EER created systematic differences of up to 1.5 dB,y between the groups. For the entire set, 
the standard deviation of the difference between the predicted and calibrated measured sounds was .6 dB. 


3. Geometric Processing 


The GPS vehicle-position data were used by the EER’s spatialization capabilities to create the impression 
that the sample sounds were traveling along the course taken during their recording. ‘This required the a 
‘retarded time’ correction to be computed — the GPS data produced the location of the source at the instant 
the noise was being received, not at the instant it was transmitted from that source. Further, all flyovers 
were normalized so that came directly overhead at their closest point. Finally, a stereoscopic projection (i.e., 
acoustic fisheye) was applied to the road vehicle sounds after pilot testing in order to prevent them from 
sounding as if they were “driving through the middle of the room.” 


F. Test Execution 


The test, dubbed WGA-—I, took place during the last week of February, 2017. 2 groups of 4 subjects were 
tested each day for 5 days, resulting in 40 subjects total. 2 subjects did not report for the test on time and 
were therefore excluded from participation, resulting in a pool of 38 subjects. All subjects listened to all 
sounds, all responses were successfully captured. 

The subjects were recruited from the local community by a contractor — typically from within 50 miles of 
NASA Langley. The requirements for participation, as provided to the contractors, were to provide subjects 
that: 


e Have no more than 30 dB of hearing loss (relative to reference hearing thresholds in ISO 389-17°) over 
the frequency range of 250 Hz to 4,000 Hz. 


e Are within 18 and 50 years of age. 


e Create an overall proportion of between 1/3 and 2/3 female participants. 


Upon arrival, subjects were given a pre-test hearing screening. Before the test began, subjects listened 
to a suite of 10 samples selected from the test in order to familiarize them with the breadth of sounds they 
would be listening to. They then completed 5 practice questions to ensure that they understood the question 
and how to record their responses using touch-screen tablet computers provided to each subject. 

Subjects were asked to rate their annoyance on a single scale shown in Figure 5. This question was 
formulated based on the recommendation by Fields et al.?' Using this scale, specifically the wording (‘Not 
at all,’ etc.), produces subject responses that are linear with the perceptual quantity under study. This 
facilitates the use of the well-understood linear regression model for data analysis. This scale was stored 
as a numeric value between 1 and 11, with the even numbers corresponding to the five ticks/words on the 
scale.» 


>It is good practice to give the subjects the ability to respond past the final tick marks on the ends of the scale.!” 
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The test proceeded through the 4 sessions, en- 
compassing all 103 sounds, as described above. Be- 
tween sessions, subjects were allowed to take an elec- 
tive break (e.g., to use the restroom). After the Not at all Slightly Moderately Very Extremely 
test, subjects were required to have their hearing arene enyoune smneyne aoyne ame 
re-screened to ensure that the sounds they had been 
exposed to during the test did not significantly al- 
ter their hearing threshold (a possible indication of 
noise—induced hearing loss). The total participation 
time, between pre— and post-test hearing screening 
was between 1.5 and 2 hours. ‘The test protocol 


was approved by the NASA Langley Institutional Figure 5. Screenshot of the question posed to the sub- 
Review Board. jects. 


How annoying was the sound to you? 


Heuristics 


Following the test there was a period of time for discussion between the subjects and the researchers. 
Although this time was not compulsory, the insights given by the subjects, for example, into their decision 
making processes, can be valuable. For this reason, a selection of observations are given here. 

Some subjects reported developing heuristics early on that they then applied to the sounds throughout 
the rest of the test. There was often disagreement, even contradiction, between the strategies of different 
people. For example, some subjects reported finding more “high-pitched” sounds more annoying, while 
others thought “low-—pitched” ones were worse, though clearly, these do not necessarily reflect objective 
measures of the sounds. Some subjects reported the opposite: judging each sound on its own merit, though 
it is likely that the former was more prevalent. 

Another typical comment was that sounds that appeared to linger were judged to be more annoying 
than those that did not. As long as a sound was not startling, a perception that the sound would “be over 
with soon” alleviated annoyance. Further, sounds that were described as being “patterned” had a greater 
tendency to evoke this response than ones that were more qualitatively constant. 

No subjects reported having their responses affected by the presentation location of the sound. This was 
true both when this information was volunteered, and when it was inquired about directly by the researchers. 
Very few subjects were able to identify the sUAS sounds as coming from ‘drones.’ 

Again, as these responses were not compulsory; they should not be taken as necessarily representative of 
a significant portion of the subject population. 


IV. Initial Analysis 


There are several ways to analyze the type of data collected in this test. Only one of these linear 
regression — will be discussed at significant length here. Other analyses that will be undertaken in the 
future, related to the remaining research questions, are discussed briefly at the end of this section. 


Linear Regression 


Several forms of linear regression were performed between noise metric values computed for the sample 
sounds and the subject responses to those sounds. Again, the form of the question posed to the subjects 
was meant to produce responses that vary linearly with annoyance, making more exotic forms of regression 
likely to be unnecessary. 

For this analysis, only recorded sounds were considered; the auralizations, both of the DEP vehicle and 
the quadcopter, are left out. The inclusion of those sounds in the test was not with the intention of direct 
comparison between auralized sounds and recordings, therefore any line fit through the data should not 
attempt to account for the variance between the those classes of sounds. Additionally, repeats of the SUI 
sounds are omitted; thus of the first 18 sounds listed in Table 3, only IDs 1 and 101 are included in this 
analysis. This is to ensure that the subject responses are identically distributed between sample sounds, 
that there is no dependence of one sound on another (which there would be if two of those sounds were 
similar/the same), and that no set of samples or flight conditions has more influence on the fit than another. 
In total, there were 46 sUAS sounds, and 20 road vehicle sounds included in the regression. 
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A. Noise Metrics 
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Figure 6. Example noise metric calculations for a box truck drive—by, ID 418. The black and blue traces 
indicate the underlying time-varying metric calculations, and the red and green horizontal lines indicate the 
representative single values. 


Noise metrics are signal processing techniques that are used to reduce a pressure-time history into a single 
number that is, perhaps, representative of the annoyance due to that sound. The set of metrics used here 
was arrived at after computation of the augmented regression model (described below) on a set of more than 
10 metrics. ‘Those shown here are meant to be both illustrative and to present the best results found thus 
far. Metric values were computed on the sample sounds recorded in the EER using the same equipment as 
the field recording, with the recording microphone placed near the center of the four subject locations, close 
to the SLM microphone used for calibration. 

The four metrics shown are: A-weighted Sound Exposure Level (SEL), C-weighted Sound Exposure 
Level (SELc), Effective Perceived Noise Level (EPNL), and Loudness exceeded 5% of the time (L5).° Ex- 
amples of these metrics are shown in Figure 6 for sample ID 418 — a driveby of the box truck that has 
significant low-frequency energy. Briefly, these metrics are: 


e SEL,: This metric is the time integration of A-weighted sound energy, normalized to a duration of 
1 second. The psophometric ‘A’ frequency weighting is the most commonly encountered measure of 
noise (and is implemented on nearly all hand-held sound level meters). dB, is based on the pure-tone 
response of the human auditory system at a level of 40 Phon — corresponding, perhaps, to the level of 
a calm conversation in a very quiet setting. Despite not seeming applicable to loud/outdoor sounds, 
dBa has found use as being correlated with annoyance across a large range of absolute levels. dBa is 
typically averaged in time in order to smooth its response. ‘The ‘slow’ averaging is used here, it is 
exponential with a time constant of 1s. The computation of SEL, is a straightforward Riemann sum 
of a dB, over time. 


SELc: The computation of SELc is identical to SEL, except that the psophometric ‘C’ frequency 
weighting (dBc) is used instead of dBa. This corresponds to the human auditory system’s frequency 
response at 100 Phon (corresponding more closely to a concert than a calm conversation), which is a 
significantly higher level than for A-weighting and more closely matches the levels at which the sample 
sounds were presented to the subjects. The result is that the SELc includes a significant amount of 
low- to mid-frequency energy (between, nominally, 100 and 1000 Hz), that SEL, does not. 


This effect can be seen in the example in Fig. 6(a). Here, the sound generated by the vehicle is seen 
to have significantly more low-frequency energy included in SELc than in SELag. 


EPNL: This metric is used by regulatory bodies internationally to certify aircraft designs for community 
noise exposure.?” This metric is a time-integration of a metric called the Tone-Corrected Perceived 
Noise Level (PNLT). PNLT is, in turn, based on one-third octave band data, sampled at half-second 
intervals. For a given interval, the one-third octave band data is converted into ‘noy bands’ by a 
non-linear transformation based on the response of the human ear at various absolute levels. A ‘tone 
penalty’ is computed between these bands — if adjacent bands have greatly differing magnitudes, 


°For a more technical discussion of SELa, SELc, and EPNL, see Ruijgrok.?? For information on Ls, see the DIN or ISO 
standard from which it is derived.?° 
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there is a penalty associated with that difference. The noy bands are combined based on a empirical 
summation formula, and the tone penalty is added to create the PNLT value for that time interval. 


The unit of PNLT is the PNdB. Due to the non-linearities in the PNLT computation, it is important 
to note that this a decibel-like quantity, and does not scale linearly with large changes in overall gain. 
For small changes in dB, the corresponding change in PNLT will be nearly equal, but large changes in 
dB can cause significant divergences between dB and PNGB. 


EPNL is an integration of the PNLT time history for the period in which the noise is within -10 PNdB 
of the peak value. In this way, EPNL takes the duration of the noise into account. An example PNLT 
time history and EPNL value is shown in Fig. 6(b). 


e Ls: There are several computational models that produce estimations of psychoacoustic loudness, a 
quantity akin to, but not directly related to, annoyance. In general, they are all based on models of 
the human auditory system. ‘The current ‘Zwicker’ model is used in this study, as described by an 
international (ISO/DIN) standard.?? By modeling the action of the various physical components of the 
ear, this model implicitly includes effects such as upward masking and non-linear frequency weighting, 
as well as some temporal effects. It does not include a penalty for tonality as EPNL attempts to 
do.* Its calculation results in a time-history, sampled at 100 Hz, in the decibel-like units of Phon. An 
example is shown in Fig. 6(c). 


Again, the metric must produce a single-number representation of a sound to be useful for linear 
regression. There is no widely-accepted single way to integrate loudness across time as there are for 
dBa, dBc, and PNLT. Instead, it is common practice to use quantiles of the loudness time history as a 
representative point.2* In this case, it was determined that loudness exceeded 5% of the time produced 
results that were highly correlated with the mean subject response.© While this approach satisfies the 
single-value criterion, it means that Ls will not account for duration effects. 


In summation, SEL~a and SELc are the simplest measures, the difference between the two providing a 
glimpse of the importance of how various frequency components of the noise impact annoyance. EPNL is a 
more complex calculation that has widespread regulatory use, but does not reflect contemporary knowledge 
of the human auditory system. Ls is the most accurate model of human hearing included, but may omit 
important aspects of the noise captured by the other metrics, such as corrections for tonality and duration. 
The performances of the various metrics on the data set may help to illuminate how the importance of these 
effects for understanding sUAS-noise annoyance. 


B. Subject Responses 


The regression analyses shown here were performed between the metric values of a sample sound and the 
arithmetic mean of the subject responses for that sound. It is important to observe how that mean relates to 
the spread of the subject responses for the samples. Figure 7 shows the histogram of the subject responses to 
the first sample sound (ID #1, an SUI flyover). It can be seen that the responses do not follow a simple bell- 
shaped distribution. Further, the responses encompass nearly the entire range of the scale. This indicates 
that there is a high-degree of variability between the subjects’ impressions of the sounds. 

There is exactly one response for each sample sound and for subject that participated in the test. For 
sample 7 and subject s, the response is y;,,, where 7 is an index within the set of sample sounds found in 
Table 3 (and included in the regression analysis), and s € [1,2,...,.5] for S = 38, the number of subjects that 
participated. For a given sample sound 2, the arithmetic mean is: 


S 
Ss Yi,s (2) 


Yi = 


GQ] 


dThere are other psychoacoustic measures based on calculations similar to loudness that attempt to account for this as well 
as other possible qualitative effects (e.g., roughness), however neither calculations of these measures, nor incorporation of these 
into a single explanatory model of human annoyance, are straightforward matters.?4 

©The 5% quantile is a commonly used value, though it is by no means an agreed-upon value across authors. In addition, the 
use of loudness in units of Sone — a power-like unit quantity of loudness — is also common for the quantile calculation.!>: 24 
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It is also important to estimate the confidence in 
the mean given the spread in the data. A confidence 
interval (CI) is a measure of the certainty in the 
estimate of the mean, and a measure of the power 
of the test to resolve this statistic. In general, if the 
confidence intervals of two samples overlap at all, it 
can be said that the responses are indistinguishably 
different given the power of the test to resolve the 
two (Analysis of Variance, a more rigorous test of 
this sort of question, is left out of this publication). 
The size of a CI is expected to shrink, roughly, as the 
square-root of the number of responses (subjects). 

Although CIs typically require an assumption 
about the underlying distribution of the data — an 
assumption that would be dubiously made after ob- Annoyance Rating 
serving Fig. 7 — Cls can still be constructed for 
the means of these samples via bootstrapping tech- Figure 7. Histogram of subject responses for one of the 
niques. ‘he method used to generate the intervals ears are ae is Meamand2 2 conndcuce 
for Fig. 7, as well as all subsequent figures, was the 
Bias-Corrected, Accelerated method (BCa), as im- 
plemented in the MatLab bootci() function.?° 100,000 bootstrap samples were used to compute each CI 
which, while perhaps excessive, ensures convergence of the bootstrapped interval values. 
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Figure 8. Scatter plots of mean subject responses and confidence intervals for the samples included in the 
regression analyses. The data in the two plots are the same, only the markers and colors are changed to aid 
the eye in vehicle identification. 


Figure 8 shows the means and Cls for the samples included in the regression. ‘The markers are colored to 
allow one to easily identify the corresponding vehicle. CIs generated via the BCa method are not constrained 
to be symmetric around the mean or the same size across samples. Both effects can be seen in this figure, 
and are due to the variation of skewness and central tendency in the subject data for the different samples 
respectively. 


C. Pooled Regression 


The first regression analyses were carried out by pooling all of the included samples into one set and fitting 
a line given the noise metric calculations on that set.’ The model function is therefore: 


Y = Bo + Xfi (3) 


fLinear regression is a much written—about subject. Any basic book on statistics will cover the topic. The reader is directed 
to Chatterjee and Hadi?® for a good introduction. 
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Figure 9. Pooled regression results for the four noise metrics. 


where Y is a column vector of model-predicted mean responses to the sample sounds, X is the column 
vector of noise metric values for those samples, and {9 and (, are the scalar regression coefficients — the 
results of the regression process. This process minimizes the sum of the squares of the difference between 
the model-predicted Y values and the column vector of the observed means Y: 


ih 
min (v — Y) (¥ — Y) (4) 
Bo,P1 

The MatLab fitlm() function was used to perform this minimization for all cases here. The results of 
these ‘pooled’ analyses are shown in Figure 9. Here, the samples corresponding to sUAS and road vehicles 
(Cars) are shown with different colors/markers, even though the line is fit to the totality of the data. 

The main numerical result of these analyses is the square of the correlation coefficient R?. This measures 
the percentage of the variance that was observed in Y that is accounted for by the model function, and is 
expressed as a number between 0 and 1. SEL, is seen to offer the best performance. This is contrasted with 
SELc, which performs very poorly. This difference is primarily caused by the inclusion of the low-frequency 
energy from the road vehicle sounds in SELc, shifting the metric values for those sounds upward relative to 
the sUAS samples. This forces the regression line to flatten, causing the SELc model to account for a lower 
amount of the observed variance in the ordinate direction. 

EPNL offers similar performance to SELa. The scatter of the individual data points is similar between 
Fig. 9(a) and (c), indicating that the two metrics are capturing similar aspects of the sounds. Previous 
studies have shown these two metrics to often be comparable for aircraft noise.?7 
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L; offers surprisingly poor performance (at least for this first regression), though it can be seen that this 
is not caused by the same forces that result in poor performance for SELc. The sUAS and road vehicle 
sample sets, as measured in Ls, span similar loudness values, indicating that Ls is simply failing to capture 
some important aspect of the noise. 

Another important observation is that the road vehicles seem to be systematically judged to be less 
annoying than the sUAS for the same amount of metric noise. This seems to hold true for all of the metrics. 
This observation will lead the development of the next regression model. 


D. Regression Diagnostics 


Having made these initial observations, it is important to address the underlying assumptions of the linear 
regression method, and to make sure that they are satisfied for this data set. Beyond the expectation that the 
data would follow a linear trend given the form of the question, there are three main criteria for applicability 
of linear regression. These are: independence of the samples, normal-distribution of the residuals (with 
0-mean), and homoskedasticity — that the distribution of the residuals does not vary systematically with 
the predictors.7° 

Regarding independence, this was provided, insofar as it can be provided in human subject testing, by the 
test protocol and design: All subjects went through familiarization and training sessions before the test began 
in order to acquaint the subjects with the range of the sounds and the use of the response scale. Further, 
the randomization of the test order for each group of subjects provided that the effects of learning/fatigue 
would be well—distributed among the groups. Lastly, all subjects responded to all questions, and, for these 
analyses, no multiple responses were grouped into a single mean. 

The last two requirements are typically fulfilled by observation of plots of the residuals of a regression. 
A histogram of the residuals can provide confidence that they are following a bell-shaped distribution (the 
requirement of normality being relatively weak!’). Heteroskedasticity, as well as other unwanted systematic 
effects in the data, can be seen by plotting the residuals against their underlying metric values. In all cases, 
these plots were generated and showed that the assumptions of the regression method were sufficiently met. 


E. Augmented Regression 


Given the observed systematic differences between 
the mean annoyance values of the sUAS samples and Total R2 = 0.82 


the road vehicle samples, an augmented linear model > Car Offset = -5.64 [dB] 
was proposed. A binary term C' was added to the $ 
model equation: Fos : shag 
: % | |——sUAS Fit 
Y=6$09+ Xi + Cho (5) = . —_=Cars Fit | 
c;, the element of C’ corresponding to sample in- ql = 
dex 2, is 1 if 2 corresponds to a road vehicle sample QO = 
and O if it corresponds to an sUAS sample. This ef- a = | 
fectively allows for two lines to be fit to the data: one co 
to the sUAS data, and one to the road vehicle data. s < 
These lines are constrained to have the same slope, = | 
as there is not a compelling reason that, asymptot- = 50 55 sa 7 - 7 80 


ically, they should be different. Thus a single slope Sound Exposure Level, A-Weighted [dB] 

is determined for all of the data while 62 captures 

the offset between the two lines in the ordinate di- Figure 10. Regression results for the augmented linear 
rection. This offset, as measured in the units of the model for SEL. 

metric, is given by 62/8,. A sample result of regressing this model on the SEL, data is shown in Figure 10. 
The results of using this model on all 4 metrics are given in ‘Table 4. 

The explanatory value of the model for all metrics, in terms of R?, is greatly improved. The value of 
the offset is not a small number for any of the metrics, indicating that there is a significant amount of 
subjective difference between the sUAS and road vehicle sounds that is not captured by any of the metrics. 
The inclusion of this binary predictor in the regression is shown to be very significant in all cases the 
p-values of the t-tests for inclusion of this predictor are well below the canonical 95% confidence value of .05 
in all cases. 
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Table 4. Regression results for the augmented linear model. Offset is measured in the respective metric’s unit. 


Metric | R? | Offset 


SEL, | .82 9.6 
SELc | .68 | 12.8 
EPNL | .80 7.6 

Ls | .75 1.0 


F. Bootstrapped Regression 


The last form of regression analysis deals with the determination of confidence intervals for the R? and offset 
values that have been determined for the augmented model. These interval estimates will help to indicate 
whether, given this dataset, one metric can confidently be resolved from another, and whether the offset 
value is significantly different from 0.® 

To generate these intervals, a non-parametric bootstrapping technique was used, similar to the BCa 
approach used to bootstrap Cls to the means. For each bootstrap sample b, the original mean subject response 
vector was modified by resampling, with replacement, from the underlying subject data, and calculating new 
means to form a resampled y. The new vector Y; was then used to perform a linear regression (as in Eq. 4). 
The result of each bootstrap sample is a 80,5, 61,5, and 62, that describe the best fit augmented model to 
the resampled means. This process is repeated many times to form the non-parametric distribution of these 
Gs. ‘This technique is a form of the ‘percentile’ bootstrapping method, and is similar BCa, though slower to 
converge.?/> 

Once the sets of bootstrap 6s have been assem- 
bled, the 2.5% and 97.5% percentiles are taken of the 


sets, and presented as the two-sided 95% confidence _ | | | | 
intervals. 20,000 bootstrap samples were drawn for 0.7 | 
the results shown here, which produced convergence ‘yy 
of the CIs to one part in 1,000 in the interval esti- 0.6 1 
mates — a process that took about 30 minutes per 
metric on one core of a contemporary laptop. The “ 
results are shown in Table 5, and graphically in Fig- - | 
ure I1. o 

First, it is important to observe that the mean + 10 | 
values for R? are reduced by bootstrapping. This is re Q ° 
due to the fact that this method effectively reintro- 6 e ] 
duces the variance behind the means of the original | 


. : 0 L 1 L 
Y vector, so that there is more variance, but the SEL SEL EPNL L 


same best fitting model. The intervals for R? pre- ‘a . ? 
serve the same trends as before: integrated metrics _. 

Aa f bett a thy pan f Figure 11. Bootstrapped 95% confidence intervals for 
tend to ae OPE eles a ee TOWSON he augmented linear regression model. Offset is mea- 
energy in SELc causes it to perform poorly. The sured in the respective metric’s unit. 
overlap in confidence intervals in the top half of 
Fig. 11 indicates this dataset cannot be used to confidently determine the best of the four metrics (though 
such a claim based on the results of a single subjective test would be dubious regardless of this outcome). 

In contrast, the fact that none of the confidence intervals in the lower half of Fig. 11 contain 0, is an 
indication that none of the four metrics sufficiently captured some subjective aspect of SUAS relative to 
road vehicle noise. The apparent negative covariance of R? and offset seen in Fig. 11 indicates a truism: the 
better a metric performs, the smaller the offset between sUAS and road vehicles. 


®=The latter result should be expected given the result of the t-tests of the 62s, but there is still a great difference between 
the determination of a 5 dB offset +4 dB, and 5+1 dB. 

h There is a parametric method, implemented in the MatLab regression package, to generate confidence bounds on regression 
coefficients using the covariance matrix thereof. Although this method produces similar, though unequal, intervals for the 
offsets, it does not produce estimates for R?. Further, this parametric method does not address the underlying distributions 
behind the y; means. 
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Table 5. Bootstrapped 95% confidence interval results for the augmented linear regression model. Offset is 
measured in the respective metric’s unit. 


Metric | Median R? R? Cl Median Offset Offset CI 


SEL 71 [.64, .78] 5.64 (4.3, 7.2] 

SELc 59 [.51, .66] 12.75 11.1, 14.6) 

EPNL 69 [.62, .75] 7.58 (6.0, 9.4] 
Ls 64 [.57, .72] 8.60 5.9, 9.3] 


Discussion of Regression Results 


The idea that various sources of noise may elicit significantly different annoyance responses is not unprece- 
dented. A germane example, though formulated for multiple-event annoyance, is work by Miedema and 
Oudshoorn that showed statistically significant differences in the DNL (an aggregate noise metric) response 
between road noise, rail noise, and aircraft noise.7° 

In the present study, the subjects responded to single events of noise, and were not instructed on what 
the sources of the noise may have been. Common comments during informal conversations after the test 
were that the subjects were typically not aware that the non-road noises came from ‘drones,’ and that the 
fact that some were flying overhead and some were presented as drive-bys did not significantly impact their 
judgments. This suggests that the subjects were queuing off of qualitative differences between the sample 
sounds. 

If the difference between the effects of sUAS noise and road vehicle noise are qualitative, then the primary 
implication is that the use of contemporary noise metrics for the evaluation of SUAS noise may have to include 
a significant component for the qualitative aspects of sUAS noise vis a vis the noises for which those metrics 
are commonly used. Those who expect for sUAS to gain widespread community acceptance based on the 
idea that they will produce no more annoyance than the equivalent amount of traffic noise, may not be 
correct. 

An important caveat is that, as stated in the in- 
troduction, this test was not conceived to be a com- 


Very 


prehensive examination of noise from either sUAS ¢ 20m 
or road vehicles. Rather, it was meant, primarily, to D > @ 30m 
demonstrate the extensibility of tools and facilities a : ee 
that NASA already possesses to the realm of sUAS = 3 | 
noise. Therefore, it is unwise to attempt to general- = = | 
ize the results of this study beyond those stated in o> : 0 : ° 
the discussion, and beyond the limited set of vehicles E D 1 
and conditions tested. = ? 

Sz 
G. Regarding Height ¢ ] 
Finally, an interesting effect has been noticed in the Z = a ae aa a6 


data, though not completely objectively explored at SEL. [dB] 
the time of writing. For sUAS samples between A 


which only height basic (where the other poe Figure 12. The effect of changing height on annoyance. 


ters of the operations are held constant) there is usu- The four samples shown are all from the SUI Endurance 
ally insignificant change in the annoyance response. equipped with 2—blade props, flying at 5 m/s, and at 


An example of this can be seen in Figure 12. The fig- different heights AGL (IDs 1, 5, 17, and 20). 

ure shows the annoyance responses for the 2-bladed 

SUI vehicle flown at four different altitudes, from 

20 to 100 m AGL. There is no significant difference 

between the annoyance responses (the confidence intervals all overlap), even though there is roughly an 8 
dB difference in SEL, between the samples. Similar trends exist for the other sets of sUAV sounds in which 
parameters other than height are held nominally constant. 
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There are two important points to be made regarding this observation: 


1. Alleviating annoyance from sUAS operations may not be simply a matter of flying higher. ‘The maxi- 
mum dBa of a flyover would be expected to reduce by 6 dB every time the distance between the source 
and receiver doubles. SELa, as a time-integrated metric, would be expected to reduce less than this 
per doubling of distance, as can be seen in Fig. 12. 


A common comment from subjects was that sounds which appeared to ‘loiter’ were judged more harshly 
than those that didn’t. 


2. This result could, at least partially, work toward explaining the offset between sUAS and road vehicle 
annoyance observed in the regression results. The road vehicles were all recorded at (nominally) 10 
m distance, and moving at and 10 m/s. This means that while the closest/fastest sUAS recordings 
were geometrically comparable to all of the road vehicles, the vast majority of sUAS recordings were 
farther/slower. The results in Fig. 12 suggest that the addition of a loitering penalty, formulated only 
from geometrical considerations, and within the limits of this dataset, will work to alleviate some, if 
not all, of the observed difference between sUAS and the road vehicles. 


Given that the result of Fig. 12 seems to indicate that this penalty ought to be able to explain a 
difference in annoyance corresponding to 8 dB SELa, and the offset as measured in SELa is confident 
to only 4.3 dB, it is not unreasonable to assume that the proper formulation of such a loitering correction 
could account for the entirety of the significant region of the offset. 


The result of equal annoyance with distance should not be construed as being extensible beyond the 
bounds of this test. It is highly likely that, given that the car sounds were the fastest/closest, and the 
subjects were known to (at least in part) develop cognitive heuristics to judge sounds, any heuristic penalty 
associated with duration would be anchored by the car sounds and then extended as a penalty to those that 
are slower/farther (all of the sUAS sounds). Likewise, it is possible, given that cars driving by are a common 
occurrence in many people’s daily life (especially in Hampton Roads, VA, where all of the test subjects were 
drawn from), and car ownership is necessary for many there as well, that proximate road noise constitutes 
a sort of cognitive baseline for acceptable noise. Lastly, on the opposite end of the spectrum, it is known 
that there is a startle-related onset penalty for noises that rise in intensity too rapidly, so that the concept 
of a time-based correction to noise beyond that provided by a simple time integration of the signal, is not 
unprecedented.?? 

This possibility will be the subject of further data analysis and research. Explaining the offset in terms 
of a loitering penalty would not exonerate sUAS noise from the previous discussion; rather, the implication 
is that for sUAS operators to compete on a level playing field noise-wise, they will have to give a good deal 
of credence to the fact that the speeding up of their operations is going to be the key to making their noise 
acceptable. This is opposed to giving credence to the idea that there is a qualitative component of sUAS 
noise that is offensive and not (as) present in road vehicle noise. 


H. Future Analyses 


The path forward for this work is clear in the near term. Analysis of the current data set will continue along 
3 fronts, related to the three research questions outlined previously: 


1. Analysis of variance on the samples that were presented in repetition to the subjects. These include the 
SUI repeats — both when the identical sounds were repeated, as well as identical conditions — and the 
sUAS auralization samples. This will primarily shed hight on the inter- and intra-subject components 
of the variance in the dataset and help to inform the design of future such tests. 


2. Factor analyses using ANOVA. This can be used to explore relationships (or the lack thereof) between 
the ‘factors’ of the samples (e.g., speed and distance) and the resultant annoyance. This work will 
inform component-level design considerations for sUAS in regards to noise. 


3. Work to increase the explanatory power of the noise metrics employed for this effort. ‘This will en- 
compass any effort to produce a loitering correction as described above, but can also be used to bring 
further psychoacoustic measures (such as tonality) to bear. This work would go toward creating tools 
that can predict annoyance due to a wide range of sUAS operations noise. 
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V. Conclusions 


This paper describes the results of a recent psychoacoustic test at NASA Langley to explore differences 
in subjective response to noise from flyovers of small unmanned aerial systems (SUAS) with noise from 
drive-bys of road vehicles encountered in residential neighborhoods. Recordings of the various vehicles were 
collected during the second half of 2016 and early 2017. The recordings were used, along with auralizations 
previously created for other tests/purposes, in a psychoacoustic test in February 2017. This test took place 
in the Exterior Effects Room, a calibrated 3D sound environment and human subject test facility. Subjects 
provided holistic responses regarding their levels of annoyance to the various test sounds. Data from 38 
subjects for all test sounds were collected. 

Initial analysis of the data from this test indicates that there may be a systematic difference between 
the annoyance response generated by the noise of the sUAS and the road vehicles included in this study. It 
is unknown as of now whether this difference can be accounted for by other factors, or whether it is being 
generated by qualitative differences between the sound of road vehicles and sUAS. This result casts doubt on 
the idea that sUAS operators can expect their operations to be greeted with minimal noise-based opposition 
as long as the sound of their systems are “no louder than” conventional package delivery solutions. 

Further analysis of the data is ongoing, including factor analysis and interrogation of the penalty from 
the point of view that it is resulting from a loitering sensation produced by the sUAS operations. A follow-on 
test, informed by the results of this test, is being planned for later in 2017. 
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